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Abstract 

The paper presents the gossip interactive Kalman filter (GIKF) for distributed Kalman filtering for 
networked systems and sensor networks, where inter-sensor communication and observations occur at 
the same time-scale. The communication among sensors is random; each sensor occasionally exchanges 
its filtering state information with a neighbor depending on the availability of the appropriate network 
link. We show that under a weak distributed detectability condition: 1) the GIKF error process remains 
stochastically bounded, irrespective of the instability properties of the random process dynamics; and 
2) the network achieves weak consensus, i.e., the conditional estimation error covariance at a (uniformly) 
randomly selected sensor converges in distribution to a unique invariant measure on the space of positive 
semi-definite matrices (independent of the initial state.) To prove these results, we interpret the filtered 
states (estimates and error covariances) at each node in the GIKF as stochastic particles with local 
interactions. We analyze the asymptotic properties of the error process by studying as a random dynamical 
system the associated switched (random) Riccati equation, the switching being dictated by a non- 
stationary Markov chain on the network graph. 
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I. Introduction 

A. Background and Motivation 

This paper presents the Gossip Interactive Kalman Filtering (GIKF). GIKF is a linear distributed 
estimator that filters noisy observations of a random process measured by a sparsely connected sensor 
network. Each sensor observes only a portion of the process, such that, acting alone, no sensor can resolve 
the signal. GIKF is fundamentally different from other distributed implementations of the Kalman filter 
([1], [2], [3], [4]) that employ some form of linear consensus on the sensor observations or estimates; in 
contrast, GIKF involves communication and observation sampling at the same time scale. GIKF runs at 
each sensor a local copy of the Kalman filter and achieves collaboration through occasional asynchronous 
state swaps between sensors at random time instants. At the random times when a sensor communicates 
with one of its randomly selected neighbors, the sensor swaps its previous state (its local Kalman filter 
state estimate and conditional error covariance) with the state of its neighbor, before processing the 
current observation. In other words, when communication is established, a sensor updates the state it 
receives from its neighbor with its present observation; otherwise, it updates its own previous state. 
Such collaboration or information exchange through state swapping is asynchronous over the network 
and occurs occasionally, dictated by the random network topology. Indeed, due to inherent environmental 
randomness, the underlying medium access control (MAC) protocol is randomized and often not known at 
the local sensor level. We assume that the sensor network uses a generic random communication protocol, 
see Section II, that subsumes the widely used gossiping protocol for real time embedded architectures, 
[5], and the graph matching based communication protocols for internet architectures, [6]. 

The paper establishes GIKF and studies its error properties. We define a weak distributed detectability 
condition 1 under which we show: 1) the GIKF error process remains stochastically bounded, irrespective 
of the instability properties of the random process dynamics; and 2) the network achieves weak consensus, 
i.e., the conditional estimation error covariance at a (uniformly) randomly selected sensor converges in 
distribution to a unique invariant measure on the space of positive semi-definite matrices (independent 
of the initial state.) To prove these results, we interpret the filtered states at each node in the GIKF as 
stochastic particles with local interactions and analyze the asymptotic properties of the error process by 
studying as a random dynamical system the switched (random) Riccati equation, the switching being 
dictated by a non-stationary Markov chain on the network graph. 

To study the information flows in the GIKF, we interpret the filtering states at each node as stochastic 
particles with controlled interactions. To prove the stochastic boundedness of the error process and the 
network weak consensus, we focus on these traveling states, which we refer to as tokens or particles, 
and not on the sequence of conditional error covariances at each sensor, which is not Markov. This 
particle point of view is reminiscent of the approach taken in fluid dynamics of studying the transport of 

'This condition is required even by a centralized estimator (having access to all sensor observations over all time) to yield 
an estimate with bounded error (for unstable systems.) 
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a particle as it travels in the fluid (Lagrangian coordinates) rather than studying the transport at a fixed 
coordinate in space (Eulerian coordinates), [7]. We show that the sequence of traveling states or particles 
evolves according to a switched system of random Riccati operators, the switching being dictated by a 
non-stationary Markov chain on the graph. A key contribution is the analysis of the resulting random 
Riccati equation (RRE). In this context, we note that the RRE arises in the literature in several practical 
filtering and control formulations with non-classical information. Prior work ([8], [9], [10], [11], [12], [13], 
[14], [15], [16], [17], [18]) mostly address qualitative properties of the RRE in terms of moment stability, 
whereas recent approaches focus on understanding the limit behavior in terms of weak convergence ([19], 
[20], [21], [22], see also [23]). In this paper, we utilize a random dynamical systems formulation of the 
RRE; however, in contrast with our work in [19], [20], the switching sequence is no longer stationary. 
Several approximation arguments of independent interest are developed to tackle this non-stationary 
behavior and to establish the asymptotic distributional properties of the RRE. 

To summarize, the paper addresses two fundamental concerns in collaborative estimation in random 
environments. It introduces distributed observability for linear dynamical estimation and addresses the 
question of minimal observation pattern (i.e., what should be the minimal number of sensors and what 
should they observe,) so that there exists a successful filtering scheme. The weak detectability condition 
(introduced in Section II-A) resolves this question through the existence of a full rank network Grammian. 
We show that satisfaction of the weak detectability condition leads to stochastic boundedness of the 
conditional filtering error at each sensor, irrespective of observability of individual sensors. The second 
concern addressed in the paper is that of robust information flow, which seeks to address the minimal 
communication required to maintain consistent (asymptotically) information dissemination in the network. 
The weak connectedness assumption formulated in Section II-A quantifies the rate of information flow 
(in random communication environments) as the mixing time of a particle undergoing a random walk 
in the network with appropriate statistics. The positive recurrence of this Markov chain translates to 
information dissemination at a sufficient rate to cope with the (possible) instability in signal dynamics 
and leads to weak consensus of the filtering errors. The notion of weak consensus introduced in the paper 
is the best form of consensus possible in such a setup because, as opposed to familiar scenarios (average 
computation/static parameter estimation,) in a dynamic situation it is not possible to accomplish almost 
sure (pathwise) consensus of the estimate or error processes. On the contrary, the weak consensus we 
establish shows that the error processes at different sensors converge in distribution to the same invariant 
measure. We do not characterize here this invariant measure as a function of the communication and 
observation policies; instead, we resolve the minimal conditions for the existence of such an invariant 
measure and hence conditions for the stability of the filtering error processes. 

We briefly summarize the organization of the rest of the paper. Subsection I-B sets up notation and 
background material to be used in the paper. Section II sets-up the problem and introduces the GIKF 
algorithm together with the observability and connectivity assumptions in Subsection II-A. An interactive 
particle interpretation and important preliminary results are in Subsection II-B. The main results regarding 
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the asymptotic properties of the GIKF are stated (without proof) and interpreted in Section III. To prove 
these results, we provide first in Section IV a random dynamic system (RDS) formulation of the switching 
iterates of the random Riccati equation arising in the GIKF. Appendix A recalls facts and results on 
random dynamical systems (RDS) needed in this Section. The main results of the paper are proved in 
Section VI. Two technical Lemmas are proven in Appendix B. Finally, Section VII concludes the paper. 

B. Notation and Preliminaries 

Let R be the reals; R M , the M-dimensional Euclidean space; T, the integers; T+, the non-negative 
integers; N, the natural numbers; and X, a generic space. For B C X, I# : X \ — > {0, 1} is the indicator 
function, i.e., 1 when its argument is in B and zero otherwise; and id^- is the identity function on X. 

Cones in partially ordered Banach spaces. We summarize facts and definitions on the structure of 
cones in partially ordered Banach spaces. Let V be a Banach space (over the field of the reals) with a 
closed (w.r.t. the Banach space norm) convex cone V + and assume V + n (— V + ) = {0}. The cone V + 
induces a partial order in V, namely, for X, Y G V, we write X < Y, if Y — X G V + . In case X <Y 
and X ^ Y, we write X -< Y. The cone V + is called solid, if it has a non-empty interior intV + ; in 
that case, V+ defines a strong ordering in V, and we write X <C Y, if Y — X G intV+. The cone V+ is 
normal if the norm || ■ || of V is semi-monotone, i.e., 3 c > 0, s.t. ^ X < Y ||X|| < c||Y||. There 
are various equivalent characterizations of normality, of which we note that the normality of V + ensures 
that the topology in V induced by the Banach space norm is compatible with the ordering induced by 
V + , in the sense that any norm-bounded set B C V is contained in a conic interval of the form [X, Y], 
where X, Y G V. Finally, a cone is said to be minihedral, if every order-bounded (both upper and lower 
bounded) finite set B C V has a supremum (here bounds are w.r.t. the partial order.) 

We focus on the separable Banach space of symmetric nxn matrices, S n , equipped with the induced 
2-norm. The subset of positive semidefmite matrices is a closed, convex, solid, normal, minihedral 
cone in S n , with non-empty interior §+ + , the set of positive definite matrices. The conventions above 
denote the partial and strong ordering in S n induced by S^f. 

Probability measures on metric spaces: Let: (X,dx) a complete separable metric space X with 
metric dx\ M(X) its Borel algebra; B(X) the Banach space of real-valued bounded functions on X, 
equipped with the sup-norm, i.e., / G B(X), \\f\\ = sup^g^ 1/(^)1; and Cb(X) the subspace of B(X) 
of continuous functions. For x G X, the open ball of radius e > centered at x is denoted by B £ (x), 
i.e., B £ (x) = {y G X \ dx{y,x) < e}. For any set T C X, the open e-neighborhood of T is given by 
T e = {y G X | inf-rgr dx(y,x) < e}. It can be shown that T £ is an open set. 

Let V(X) be the set of probability measures on X. A sequence { / u i } tG T+ of probability measures in 
V(X) converges weakly to \l G V(X) if lim^oo < /, fit > = < f, \i >, V / G Cb{X). By Portmanteau's 
theorem, the above is equivalent to any one of the following: 

[i] For all closed F G M(X) limsup^^ fit(F) < fi(F) 

[ii] For all open O G M(X) liminf t ^oo m(0) > fi(0) 
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Weak convergence is denoted by fi t => fi and is also referred to as convergence in distribution. The 
weak topology on V(X) generated by weak convergence can be metrized. In particular, e.g., [24], one 
has the Prohorov metric d p on V(X), such that the metric space (V(X), d p ) is complete, separable, and 
a sequence {^ t } tS T + in V{X) converges weakly to // in V(X) iff lim^oo d p {^ t , ^) = 0. The distance 
between two probability measures //i,//2 in V(X) is computed as: 

dp {ni,H2) = inf {e > I < / u 2 (J r £ ) + e, V closed set F} (1) 

II. Gossip Interactive Kalman Filter (GIKF) 

A. Problem setup 

Signal/Observation Model We consider a discrete-time linear dynamical system observed by a network 
of N sensors. The signal model is: 

x t+ i = Jx t + w t (2) 

where x t G R M is the signal (state) vector with initial state xo distributed as a zero mean Gaussian 
vector with covariance Po and the system noise {w<} is an uncorrected zero mean Gaussian sequence 
independent of xo with covariance Q. The observation at the n-th sensor y™ G M m ™ at time t is: 

y r = c n * t + v ? o) 

where C n G R m ^ M and {vj 1 } is an uncorrelated zero mean Gaussian observation noise sequence 
with covariance 1Z n 3> 0. Also, the noise sequences at different sensors are independent of each other, 
the system noise process and the initial system state. Because of the limited capability of the sensors, 
typically the dimension of y™ is much smaller than that of the signal process and the observation process 
at each sensor is not sufficient to make the pair {x^y™} observable 2 . We envision a totally distributed 
application where a reliable estimate of the signal process is required at each sensor. 3 The sensors achieve 
collaboration with each other by means of occasional communication with their neighbors, whereby they 
exchange their filtering states (to be defined precisely.) We assume that time is slotted and inter-sensor 
communication and sensing (observation) take place at the same time-scale. 

Communication Model Communication among sensors is constrained by several factors including 
proximity, transmit power, and receiving capabilities. We model the underlying communication structure 
of the network in terms of an undirected graph (V, £ ) where V denotes the set of N sensors and £ is 
the set of edges or allowable communication links between the sensors. The notation n ~ / indicates 

2 lt is possible that some of the sensors have no observation capabilities, i.e., the corresponding C„ is a zero matrix. Thus 
the formulation easily carries over to networks of heterogeneous agents, consisting of 'sensors' which actually sense the field 
of interest and actuators, which implement local control actions based on the estimated field. 

3 The term sensor network here refers to a network of agents (possibly distributed over a geographical region) with varied 
functionalities. For example, some agents may be physical sensors while others may be remote actuators, in which case, the 
corresponding observation matrix C n is identically zero. In this paper, we use the term sensor to denote a generic network agent. 
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that sensors n and I can communicate, i.e., £ contains the undirected edge (n,l). The graph can be 
represented in terms of its N x N symmetric adjacency matrix A: 

[ 1 if (n,l) € £ 

A n l={ K [ (4) 

[ otherwise 

We assume that the diagonal elements of A are identically 1, indicating that a sensor n can always 
communicate to itself. Note, that £ is the maximal allowable set of links in the network at any time, 
however, at a particular instant, each sensor may choose to communicate only to a fraction of its neighbors. 
The exact communication protocol is not so important for the analysis, as long as some weak connectivity 
assumptions are satisfied. For defmiteness, we assume the following generic communication model, which 
subsumes the widely used gossiping protocol for real time embedded architectures ([5]) and the graph 
matching based communication protocols for internet architectures ([6].) We make this precise in the 
following, which we generalize later. Define the set Ai of symmetric 0-1 N x N matrices: 

M = {A\l T A = l T , Al = l, A<£} (5) 

In other words, M is the set of adjacency matrices, such that, every node is incident to exactly one 
edge (including self edges) and allowable edges are only those included in E. 4 Let V be a probability 
distribution on the space M. The sequence of time-varying adjacency matrices, {A(i)} t6 N, governing 
the inter-sensor communication, is then an i.i.d. sequence in M with distribution V and independent of 
the signal and observation processes. 5 We make the following assumption of connectivity on the average: 
Assumption C.l: Define the symmetric stochastic matrix A as 

A = E [A(t)] = [ AdV(A) (6) 
JM 

The matrix A is assumed to be irreducible and aperiodic. 

Remark 1 The stochasticity of A is inherited from that of the elements of M.. We are not concerned with 
the properties of the distribution V as long as the weak connectivity assumption above is satisfied. The 
issue of A being irreducible depends both on the set of allowable edges £ and the distribution V. We do 
not pursue that question in detail here. However, to show the applicability of Assumption C.l and justify 
the notion of weak connectivity, we note that such a distribution V always exists if the graph (V, £) is 
connected. We give a Markov chain interpretation of the mean adjacency matrix A, which will be helpful 
for the analysis to follow. The matrix A can be associated to the transition kernel of a time -homogeneous 
Markov chain on the state space V. Since the state space V is finite, the irreducibility of A suggests 
that the resulting Markov chain is positive recurrent. Due to symmetricity, the Markov chain is reversible 
with unique invariant distribution ir on V, where ir is the discrete uniform distribution on V. 

4 The set M is always non-empty, in particular the N x iV identity matrix In € M. 

5 For convenience of presentation, we assume that A(0) = 7jv, although communication starts at slot t = 1. 
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Observability Conditions: Weak Detectability Successful filtering even in the centralized setting 
(assuming all the sensors can forward their observations at all time to a fusion center) requires some 
form of detectability and stabilizability. In the present distributed setting we impose the following weak 
assumptions on the signal/observation model: 

Stabilizability: Assumption S.l The pair (J 7 , Q 1/2 ) is stabilizable. The non-degeneracy of Q ensures 
this. 

For distributed detectability, we assume the following: 

Weak Detectability: Assumption D.l There exists a walk 6 of length t > 1, (m, n,2, ■ ■ ■ ,nt), covering 
the N nodes, such that, the matrix Yf i=1 {F i ~ 1 ) T CnC ni F l ~ l is invertible. 

Remark 2 Note, as permitted by the general definition of a walk, the sequence (ni,n2,--- , n^) may 
consist of repeated vertices and, in particular, self-loops (if permitted by A) 

Remark 3 When T is invertible, D.l may be replaced by the full rank of 

N 

G = J2CnCn (7) 

n=l 

Indeed, by the irreducibility of A (equivalently, by the connectivity of the graph induced by A,) we 
can find a walk (ni,n2, • • • ,ne) of length t > N, which covers the network, i.e., visits each node at 
least once. Hence, if T is invertible and (7) holds, it follows that the matrix Yli=i (*^ l ) T ^n^n^ 1 ^ 1 
corresponding to this walk is invertible leading to Assumption D.l. 

Remark 4 From the positive definiteness of the measurement noise matrices 1Z n , it follows that under D.l, 
the matrix £j =1 ( J 7 * _ 1 ) T C% ( 1Z~} C n . F l ~ 1 is invertible. 

Remark 5 Assumption D.l is minimal, in the sense, that, even in a centralized setting (a center has access 
to all the sensor observations over all time,) it is required to ensure detectability for arbitrary choice of 
the matrix F governing the signal dynamics. This justifies the nomenclature weak detectability. 

Algorithm GIKF We now present the algorithm GIKF (gossip based interacting Kalman filter) for 
distributed estimation of the signal process x< over time. We start by introducing notation. Let the filter 
at sensor n be initialized with the pair fx |_i,Pol> where x |_i denotes the prior estimate of xo (with 
no observation information) and Pq the corresponding error covariance. Also, by (x£j t l , P t n ) denote the 
estimate at sensor n of x t based on information 7 till time t — 1 and the corresponding conditional error 
covariance, respectively. The pair (5c r ^ t ll P t n ^j is also referred to as the state of sensor n at time t — 1. 
To define the estimate update rule for the GIKF, let — >■ (n, t) be the neighbor of sensor n at time t 

6 A walk in this context is denned w.r.t. the graph induced by the non-zero entries of the matrix A. 

7 The information at sensor n till (and including) time t corresponds to the sequence of observations {y"}o< s <t obtained at 
the sensor and the information received by data exchange between its neighboring senors. 
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w.r.t. the adjacency matrix 8 A(t). We assume that all inter-sensor communication for time t occurs at 
the beginning of the slot, whereby communicating sensors swap their previous states, i.e., if at time t, 
— > (n,t) = I, sensor n replaces its previous state fx^_ 1 ,P t n ^ by (x^^jPj) and sensor I replaces its 

previous state fx^ _ v Pf\ by fxJ| t _ 1 ,P"V The estimate update at sensor n at the end of the slot (after 
the communication and observation tasks have been completed) is: 



X *+l|t 



pn 



= E 
E 



x t+ i 

(x t +i - x? + i|t) (x t +i 



? ^(n,t) 

S|t-i ' 



>(n,t) 



H+i\t 



H\t-i ■ 



P, 



Kn,t) 



(8) 
(9) 



Due to conditional Gaussianity, the filtering steps above can be implemented through the time-varying 
Kalman filter recursions, and it follows that the sequence |^r} of conditional predicted error covariance 
matrices at sensor n satisfies the Riccati recursion: 



p« +1 = TP^nftjT + Q _ jrp t -(n,t) c T ^p^t) Q T + ^ 1 ^p^t) 



T 1 



(10) 



Note that the sequence |i^ n | is random, due to the random neighborhood selection function — > (n,t). 
The goal of the paper is to study asymptotic properties of the sequence of random conditional error 
covariance matrices |^r} at every sensor n and show in what sense they reach consensus, so that, in 
the limit of large time, every sensor provides an equally good (stable in the sense of estimation error) 
estimate of the signal process. 



B. An Interacting Particle Representation 

To compactify the notation in eqn. (10), we define the functions f n : 8^ i — > §+ for n = 1, • • • , N 
defining the respective Riccati operators 9 : 

f n (X) = FXF T + Q- PXCl {C n XCl + K n y l C n XF T (1 1) 

Recall the sequence {— > (n,t)} te T + of neighborhoods of sensor n. The sequence of conditional error 
covariance matrices {Pt l } te j + at sensor n then evolves according to 

Pt+l = fn (P?™) (12) 

The sequence {-P™} i s non-Markov (not even semi-Markov given the random adjacency matrix sequence 
{A(t)},) as Pt + i at time t is a random functional of the conditional error covariance at time t — 1 of the 
sensor — > (n,t), which, in general, is different from sensor n. This makes the evolution of the sequence 
|-P"| difficult to track. To overcome this, we give the following interacting particle interpretation of 

8 Note that n(t) is unambiguously defined as A(t) is a matching matrix, and also by symmetry we have — > (— > (n, t),t) = n. 
It is possible that — > (n, t) = n, in which case the graph corresponding to A(t) has a self-loop at node n. 

9 In case a sensor does not observe, i.e., C n = 0, then the corresponding Riccati operator /„ in eqn. (11) reduces to the 
Lyapunov operator. 
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the conditional error covariance evolution, which naturally leads us to track semi-Markov sequences of 
conditional error covariance matrices from which we can completely characterize the evolution of the 
desired covariance sequences |^t"| for n = 1, • • ■ , N. 

To this end, we note that the link formation process given by the sequence {.A(i)} can be represented 
in terms of N particles moving on the graph as identical Markov chains. The state of the n-th particle 
is denoted by p n (t), and the sequence {p n (t)}t^j + takes values in [1, • • • ,N]. The evolution of the n-th 
particle is given as follows: 

Pn(t) =-> (Pn(t ~ 1), t), Pn (0) = U (13) 

Recall the (random) neighborhood selection — > (n, t). Thus, the n-th particle can be viewed as originating 
from node n at time and then traveling on the graph (possibly changing its location at each time) 
according to the link formation process {A(t)}. The following proposition establishes important statistical 
properties of the sequence {p n (t)} teT+ ' 

Proposition 6 

[i] For each n, the process {p n (t)} te j + is a Markov chain on V = [1, ■ ■ ■ , N] with transition probability 
matrix A. 

[ii] The Markov chain {p n (t)}teT + * s er g°dic with the uniform distribution on V being the attracting 
invariant measure. 

Proof: For part [i], we note that, by the independence of {A(t)}, for any t G T + and l t ,- ■ ■ , l G V, 

P \Pn(t) = h \Pn(t - 1) - h-1, ■ ■ ■ ,Pn(l) = h,Pn{0) = k } = P [Pn(t) = k \Pn(t - 1) = k-l] 

= P[->(i t _i,t-l) = / t ] 

= F[A lt _ ult (t-l) = l] 

= K_ uh (14) 

where the last step follows from the fact, that the entries of A(t — 1) are binary. This establishes the 
desired Markovianity of the sequence {pn(£)} igT+ - 

For part [ii], since the state space V is finite, the irreducibility of A implies its positive recurrence and 
hence the invariant measure (the uniform distribution on V) is unique. That this measure is attracting 
follows from the aperiodicity of A. ■ 

For each of the Markov chains {p n (i)} teT+ , we define a sequence of switched Riccati iterates {P n (t)}: 

P n (t + l) = f Mt) (P n (t)) (15) 

The sequence {P n {t)}t£i can be viewed as an iterated system of Riccati maps, the random switching 
sequence being governed by the Markov chain {p n (t)}te_j + - A more intuitive explanation comes from 
the particle interpretation, precisely the n-th sequence may be viewed as a particle originating at node 
n and hopping around the network as a Markov chain with transition probability A whose instantaneous 
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state P n (t) evolves by the application of the Riccati operator corresponding to its current location. In 
particular, in contrary to the sequence of conditional error covariances at sensor n, \ P n (t) \ , the 

I J tGT+ 

sequence {P n (t)} t£T+ does not correspond to the error evolution at a particular sensor. The following 
proposition shows that the sequence {P n (t)} t£ j + is semi-Markov and establishes its relation to the 
sequence \ P n (t) > of interest. 

I J tGT+ 



tGT+ 

Proposition 7 

[i] The sequence {P n (t)} teT+ is semi-Markov, given the Markov switching sequence, i.e., 

E[l r (P„(t + l))|{P n (*),p„(«)} < a < t ] = E[l r (P n (t+l))\P n (t),p n (t)}, VteT + , T e (16) 

[ii] Consider the sequence of random permutations {^t} t eT + on ^> gi yen by 

(vr m (l),-.. ,7r t+1 (iV)) = (^(7r t (l), *),-■■ ,^(n t (N),t)) (17) 

with initial condition 

(7r (l),--- ,7r (AT)) = (l,--- ,N) (18) 
(Note that 7r<(n) = p n (t) for every n, where p n (t) is defined in eqn. (13).) Then, for t € T + , 

(P^ + l),... ,P N (t + l))=(P Ml) (t + l),--- ,P MN) (t + l)) (19) 
Part [ii] of the above proposition suggests that the asymptotics of the desired sequence < P n (t) > 

{ > tGT+ 

for every n can be obtained by studying the same for the sequences {P n (t)} te j + - Also, part [i] of 
Proposition 7 demonstrates the nice structure of the sequence {P n (t)} teJ+ . In the following, in particular, 
we will show that the sequences {P n (t)} teJ+ reach consensus in a weak sense, which by part [ii] will 

establish weak consensus for the sequences <^ P n (t) > of interest. Hence, in the subsequent sections, 

I J teT+ 

we will study the sequences {P n (t)} teT+ , rather than working directly on the sequences < P n (t) > 
of interest, which involve a much more complicated statistical dependence. 



III. Main Results 

In this section, we present and discuss the main results of the paper under Assumptions C.l, S.l, D.l, 
see page 7. The first result does not directly concern the sequences |p„(£) j for n = 1, ■ ■ ■ , N, but sets 
the stage for presenting the key result regarding the convergence of these sequences and is of independent 
interest. 

Theorem 8 For a given A, let {p{t)} teT+ be a stationary Markov chain on V with transition probability 
matrix A, i.e., p(0) is distributed uniformly on V. Let v be a probability measure on and consider 
the random process <^ Pit) \ given by 

I J tGT + 



P(t + l) = f m (p(t)), t G T + (20) 
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where P(0) is distributed as v and independent of the Markov chain {p(t)}. Then, there exists a probability 
measure (unique) fj, A (depending on A only,) such that, for every v, the process jp(t) j constructed above 
converges weakly to [i A . In other words, for any v if P(0) ~ v and independent of {p(t)}, we have as 
t — > oo that the composition of Riccati operators converges in distribution 

fp{t) ° fp(t-i) ■ ■ ■ o /p( ) (P(0)) (21) 

Remark 9 We stress here that the dependence of the invariant measure fi A on the communication policy 
V manifests only through the mean matrix A. 

We now state the key result characterizing the convergence properties of the sequences jp n (i) j. 

Theorem 10 [i] Let q be a uniformly distributed random variable on V and independent of the sequence 
of adjacency matrices {A(t)} t& T+ - Then, the sequence < P q {t) \ converges weakly to fi A (the latter 
being defined in Theorem 8), i.e., 

P q (t) ^ (22) 

In other words, the conditional error covariance jp ? (i)j of any randomly selected sensor (estimator) 
converges in distribution to fi A . 

[ii] For every n G [1, • • ■ , N], the sequence {P n (t)} tei: ( or ^ e sequence < P nt ^(t) > is stochas- 
tically dominated by the distribution [i A as t — >■ oo, i.e., for every a > 0, we have 

limsupP(||P n (t)|| >a)< M 7 ({Xe§^|||X|| >«}) (23) 
limsupP(P„(i) h al) < /j? ({X e§+\Xh al}) (24) 

t^oo 

More generally, for a closed set F preserving monotonicity, i.e., X G F implies Y G F for all Y >z X, 
we have 

lim sup P (P„ (t ) G P) < fi A (P) (25) 

In words, Vn, the pathwise error associated with x„. t ( n )(i) is stochastically dominated by fi A . 

[iii] For each n, the sequence of error covariances < P n (i) > is stochastically bounded, 

lim supPf P n (t) >j)=0 (26) 

Specifically, for all closed P, we have 

lim sup P (P„(t) G F) < Nfi A (P) (27) 

We discuss the consequences of Theorem 10. The first part of the theorem reinforces the weak consensus 
achieved by the GIKF algorithm, i.e., the conditional error covariance at a randomly selected sensor 
converges in distribution to the invariant measure /r 4 . Reinterpreted, it provides an estimate {5t q (t)} (in 
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practice, obtained by uniformly selecting a sensor q independent of the random gossip protocol {^4(t)} 
and using its estimate x g (t) for all time t) with stochastically bounded conditional error covariance under 
the weak detectability and connectivity assumptions. Note that the results provided in this paper pertain to 
the limiting distribution of the conditional error covariance and, hence, the pathwise filtering error. This is 
a much stronger result than providing moment estimates of the conditional error covariance, which does 
not provide much insight into the pathwise instantiations of the filter. In this paper, we do not provide 
analytic characterizations of the resulting invariant measure /r 4 . However, Theorem 8 also provides an 
efficient numerical characterization of fi A . In particular, the weak convergence in eqn. (21) shows that 
the empirical distribution obtained by plotting repeated instantiations of the process j (eqn. (20)) 

would converge to [i A . 

Another class of estimates obtained by the GIKF algorithm is demonstrated in the second part of 
Theorem 10. For each n, the estimate {xw^i)} is obtained in practice by starting at the node n 
and then performing a random walk, 7r t (n), through the graph and collecting the estimates on the way. 
Eqns. (23-25) show that, in the limit as t — > oo, these estimates are at least as good as the estimate 
{5t q (t)} obtained by probing a randomly selected node and using its estimate throughout. For some 
n, whether the estimate {x^^^t)} is strictly better than the estimate {x q (t)} asymptotically is an 
interesting technical question and not resolved in this paper. On the contrary, another possibility may 
be an extension of eqn. (25) to all closed F leading to the weak convergence of j P 7Tt („) (t) j to fi A by 
Portmanteau's theorem. However, the inequality in eqn. (25) cannot be strict for all n, as we have for 
all closed F and e > (see Subsection VI-B,) 

1 N - 

-^liminfp(P^ (n) (t)€^)>^ (28) 
n=l 

The last part of Theorem 10 shows that weak detectability (which is necessary for the error of a 
centralized estimator to be stochastic bounded) is sufficient in the distributed gossip setting to lead to 
sensor estimates with stochastically bounded errors. The upper bound presented in eqn. (27) is highly 
conservative and in fact, we have for all closed F (see Subsection VI-B,) 

N 

VlimsupP (P n (t) ef)< Nn A (29) 

. t— s-oo \ ' 

71=1 

IV. The auxiliary sequence {Pt}: RDS formulation 

The asymptotic analysis of the semi-Markov processes {P n (t)} for n = 1, • • ■ ,N does not fall under 
the purview of standard approaches based on iterated random systems ([25]) or a random dynamical 
system (RDS) ([26]) as the switching Markov chains {p n {t)} are non-stationary. In this section, we 
consider an auxiliary process |p(i) j whose evolution is governed by similar random Riccati iterates, 
the difference being that the switching Markov chain is stationary i.e., the switching Markov chain 
{p(t)} is initialized with the uniform invariant measure on V. We analyze the asymptotic properties of 
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the auxiliary sequence jp(i)j by formulating it as a RDS on the space and then in subsequent 
sections we derive the asymptotics of the sequences {P n (t)} for n = 1, ■ ■ ■ , iV through comparison 
arguments. We start by formally defining the sequence |p(t)| 10 : 

Consider a Markov chain on the graph V, {p(t)} teT+ , with transition matrix A and uniform initial 
distribution, i.e., 

P[p(0) = n] = -i n = l,---,N (30) 

By Proposition 6, the Markov chain {p(t)} is stationary. 
We now define the auxiliary process j as follows: 

P(t + l) = f P{t) (P(t)) (3D 

with (possibly random) initial condition P(0). 11 

Before reading the next two Sections, we refer the reader to Appendix A where we review preliminary 
facts and results from the theory of monotone, sublinear random dynamical systems (RDS) ([27]) tailored 
to our needs. We then show in Subsection IV-A that the sequence for each n, admits an ergodic 

RDS formulation evolving on and establish some of its properties in Subsection IV-B. 

A. RDS formulation o/|p(t)| 

In this subsection, we construct a RDS (6 R , ip R ) on S+ , which is equivalent to the auxiliary sequence 
jp(i) j in distribution. To this end, we construct the Markov chain {p(t)} (in a distributional sense) on the 
canonical path space. Let f2 denote the set {1, • • • , N} with T denoting the corresponding Borel algebra 
on Q, which coincides with the power set of {1, • • • , N}. Denote by Q R the two-sided infinite product 
of sets $7, Q, R = ®t^-oo^> i.e., Q R is the space of double-sided sequences of entries in {1, • • • , N}, 
i.e., 

(l R = {u = (--- .w.i.wcw!,-.-)!^ €{1,--- ,N}, VteT} (32) 

We equip Q, R with the corresponding product Borel algebra J :R = ^^_ oc T generated by the cylinder 
sets. Note that {<^t} teT+ for all u € Q, R denotes the canonical path space (trajectory) of the Markov chain 
{p{t)} t€J+ . The reason for introducing two-sided sequences is a matter of technical convenience and will 
be evident soon. Consider the unique probability measure F R on F R , under which the stochastic process 
(two-sided) {uj t } t£j is a stationary Markov chain on the finite state space {1, • • ■ , N} with transition 
probability matrix A. By the assumption of stationarity and Proposition 6, the distribution of oj t for each 
t G T is necessarily the uniform distribution on {1, • • ■ , N}. In particular, we note that the stochastic 

10 We are interested in the distributional properties of the various processes of concern. The actual pathwise construction is 
not of importance as long as the required distributional equivalence holds. We assume that the measure space (f2, T, P) is rich 
enough (or suitably extended) to carry out constructions of the various auxiliary random variables. 

"Although the sequences {P„(t)} of interest have deterministic initial conditions, it is required for technical reasons (to be 
made precise later) to allow random initial states P(0), when studying the auxiliary sequence jp(i) j. 
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processes {p(t)} t€T and {uj t }t<=T are equivalent in terms of the distribution induced on path space. Define 
the family of transformations {9 R } teT on O as the family of left-shifts, i.e, 



0fw = w(t + -), VteT (33) 

With this, the space (fl R ,T R ,F R , {6 R ,t G T}) becomes the canonical path space of a two-sided sta- 
tionary sequence equipped with the left-shift operator and hence (see, for example, [28]) satisfies the 
Assumptions A.l)-A.3) in Definition 17 to be a metric dynamical system and, in fact, is also ergodic. 

We now set to define the cocycle (p R , see also Definition 17, over S£, which gives the RDS of interest. 
We define / iT+x^xSf — > by: 

<p R (0,u,X) = X, Vlj,X (34) 
<p R {l,u,X) = U (X), Vc,X (35) 
<p R (t,w,X) = f e n_ M0) (ip R {t-l,LO,X))=f ult _ l (l P R {t-l,LO,X)), Vt>l,u,X (36) 

(Note that, by property of the left shift 6 R , we have ^ 1 w(0) = u>t, which explains the equality in 
eqn. (36).) The cocycle tp R defined satisfies the assumptions of measurability jointly in its arguments, and 
the continuity of the map ip R (t,u, •) : i — > §+ w.r.t. the phase variable X for each fixed t,ui follows 
from the continuity of the corresponding Riccati operator. The pair (6 R , </? R ) thus forms a well-defined 
RDS on the phase space . Now consider the sequence of random variables {ip R (t, w, P n (0))} tgT (as 
explained earlier, the randomness is induced by co,) which can be viewed as successive (random) iterates 
of the RDS (9 R ,ip R ^) starting with the initial state P n (0). By construction, it follows that the sequence 
{<p R (t, ui, P n (0))] teT+ is distributionally equivalent to the sequence |p(t)| . In particular, 

ip R (t,u;,P n (0)) = P(t), VtGT + (37) 



Thus, analyzing the asymptotic distributional properties of the sequence < P(t) > is equivalent to 
studying the sequence {(p R (t,oj, Pn(0))} teT+ , which we undertake in the next subsection. 



B. Properties of the RDS (6 R , ip R ) 

We establish some basic properties of the RDS (d R , Lp R ) representing the auxiliary sequence j. 

Lemma 11 

[i] The RDS [9 R , (p R ) is conditionally compact. 

[ii] The RDS [9 R , (p R ) is order preserving. 

[hi] If in addition Q is positive definite, i.e., Q » 0, then (0 R , ip R ) is strongly sublinear. 

Proof: The claim in [i] (conditional compactness) is an immediate consequence of the finite dimen- 
sionality of the underlying vector space S^. 

The order preserving property [ii] follows from the monotonicity of the individual Riccati operators 
f n and hence finite compositions of them remain order-preserving. 
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The strong sublinearity uses the concavity of the Riccati operators and their monotone nature and is 
a routine extension to an arbitrary number N of Riccati operators, given the development in [19] (see 
Lemma 21 in [19]) for the case of two Riccati operators. ■ 

V. ASYMPTOTICS OF 

The main result here concerns the asymptotic properties of the auxiliary sequences < P(t) > for 

I J tGT + 

each n € [1, • • ■ , N]. We have the following: 

Theorem 12 Under the assumptions C.l,S.l,D.l, see page 7, there exists a unique equilibrium probability 
measure \i A on the space of positive semidefmite matrices , such that, for each n G [1, ■ ■ ■ , TV], the 
sequence \ Pit) \ converges weakly (in distribution) to fi A from every initial condition P n (0): 

{P(t)}=>^. Vne [l,--- ,N] (38) 

The rest of the subsection is devoted to the proof of the above result. But, before that, we highlight some 
consequences of Theorem 12. 

Remark 13 It is important to note, as stated in Theorem 12, that the equilibrium measure [i A does not 
depend on the index n and the initial state P(0) of the sequence jp(i) j, but is a functional of the 
network topology and the particular (randomized) communication protocol captured by the matrix A. 
Theorem 12, thus concludes that the sequences jp(i) j reach consensus in the weak sense to the same 
equilibrium measure irrespective of the initial states. 

The proof of Theorem 12 is rather long and technical, which we accomplish in steps. 

Lemma 14 Recall Assumption D.l, page 7, and let, in particular, wq = (ni, ■ ■ ■ , ni) be a walk such that, 
the Grammian 

I 

G W0 = ^(.T- 1 ) T C^C m ^- 1 (39) 

i=i 

is invertible, where t > 1. Define the function g Wa : i—> by 

g Wo (I) = /n ( o/„ ( -i°-° /n, (X) (40) 
Then, there exists a constant ao > such that the following uniformity condition holds: 

9 Wo {X)<a I, VleSf (41) 
In other words the iterate g Wo (-) is uniformly bounded irrespective of the value of the argument. 

The proof is provided in Appendix B. Note that in eqn. (39) the observation matrix C ni is indexed 
by ni the current site visited by the random walk wq introduced in Lemma 14. Also, note that the 
function g Wa (X) defined in eqn. (40) is indexed by the walk wq- 
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The following key lemma establishes asymptotic boundedness properties of jp(t) j and is proved in 
Appendix B. 

Lemma 15 The sequence jp(i) j is stochastically bounded for each n under the Assumptions of Theo- 
rem 12, i.e., 

lim sup p( P(t) > j) = (42) 



We now complete the proof of Theorem 12. 

From Lemma IV-A we note that (6 R , tp R ) is strongly sublinear, conditionally compact and order- 
preserving. Also, the cone satisfies the conditions required in the hypothesis of Theorem 26. We note 
for t > 

<p R (t, u>, 0) = f u{t _ 1} (<p(t - 1, to, 0)) h Q > (43) 

Thus the hypotheses of Theorem 26 are satisfied, and precisely one of the assertions a) and b) holds. 
By an argument similar to Lemma 23 in [19], we can show that assertion a) cannot hold in the face 
of stochastic boundedness of the sequence \ P(t) > (Lemma 15). Thus assertion b) holds, and, as a 

direct consequence of Theorem 26, we establish the existence of a unique almost equilibrium u a (uj) » 
defined on a ^-invariant set VL* G P R with P (17*) = 1 such that, for any random variable v(oj) possessing 
the property ^ v(oj) ^ au A {uj) for all well* and deterministic a > 0, the following holds: 

lim tp (t, 9-too, v{Q- t oj)) = u A (co), u G ST* (44) 

t— >QO 

From the distributional equivalence of pull-back and forward orbits, Lemma 23 establishes the existence 
of a unique almost equilibrium u A , i.e., a unique equilibrium measure for the process jp(f) j from the 
distributional equivalence of pull-back and forward orbits. However, to show that the measure induced by 
u A on 8^ is attracting for |p(£) j, eqn. (44) must hold for all initial v, whereas Lemma 23 establishes 
convergence for a restricted class of initial conditions v. We need the following result to extend it to 
general initial conditions. 

Lemma 16 Under the assumptions of Theorem 12, let u A be the unique almost equilibrium of the RDS 
(9 R ,<p R ). Then 

P (u : u A (u) y Q) = 1 (45) 

Proof: The proof uses the fact that, for all n, f n (X) y Q, and is routine given the corresponding 
development in Lemma 24 of [19]. ■ 
We now complete the proof of Theorem 12. 

Proof of Theorem 12: : Let fi A be the distribution of the unique almost equilibrium in eqn. (44). 
By Lemma 16 we have \x A (§++) = 1. Let Pq G be an arbitrary initial state. By construction of 
the RDS (6 R ,tp R ), the sequences {Pt} tgT+ and {ip R (t, w,Po)} tgT are distributionally equivalent, i.e., 
P t = ip R (t, u, P ). Recall fl* as the 6^-invariant set with P~ (Q*) = 1 in eqn. (44) on which the almost 
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equilibrium u is defined. By Lemma 16, there exists fii C ST with P 7 (Oi) = 1, such that 

u A (w) >r Q, weS^i (46) 
Define the random variable X : Q. i — > by 

{P if w G Hi 
o if w g n; 

Now choose a > sufficiently large such that 



(47) 



P r< «Q (48) 

This is possible because Q 3> 0. Then 

r< ^ a/fw), well* (49) 

Indeed, we have 

r< Po = X(u) <aQ< au A (oj), u G Hi (50) 

and 

= X(u) r< au^u), wGH\Hi (51) 
We then have by the discussion preceding eqn. (44) 

lim <p R (t, 0-tu, X (9-tU))) = well* (52) 

Since convergence P 7 a.s. implies convergence in distribution, we have 

<p R (t,0-tv,X{0- t uj)=>n'* (53) 

as t — s- oo, where => denotes weak convergence or convergence in distribution. Then, by Lemma 23, 
the sequence ^p R (t,u, X(cu)j j also converges in distribution to the unique stationary distribution 

i.e., as t — >■ oo 

(54) 

Now, since P 7 (Hi) = 1, by eqn. (47) 

y> fl (t,w,Jb) = <£ R , P 7 a.s., t G T + (55) 

which implies 

<p R (t,LJ,P ) = <p R (t,uj,X(uj^ , t G T + (56) 

From eqns. (54,56), we then have ip R (t, u, Pq) => /U^ 4 , which together with the distributional equivalence 
P t — (f R (t, u, Pq) noted above implies, as t — > oo, P t ==> /j, A . ■ 
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VI. Proofs of main results 

A. Proof of Theorem 8 

Proof: By Theorem 12 we know that such a sequence jp(i)j converges weakly to fi A when 
started from a deterministic initial condition. In the case, P(0) is distributed as v, we note that, by the 
independence of P(0) and the Markov chain {q(t)}, 



E 



9 = E K p(t) ) l p(o) = x \ du{x) 



(57) 



for any g G C&(§+). Now, the distribution of the sequence |p(i) j conditioned on the event P(0) = X is 
the same as that when the sequence starts with the deterministic initial condition X (this is true because 
P(0) is independent of {q(t)}.) Hence by Theorem 12 



lim E 

t— s-oo 



(P(tj) |P(0) 



g(vW A <y) 



(58) 



for all X. Since g is bounded, the dominated convergence theorem and eqn. (57) lead to 



lim E 

t— >oo 



P(t) 



(59) 



for all g G Cft(X), and hence the required weak convergence follows. 

B. Proof of Theorem 10 

Proof: We prove Theorem 10 in the order 1),3) and 2). 
Consider any T G B(S^). We estimate the probability P (P q (t) £ T^j. To this end, we note that 



N 



N 



[P q (t) er)=^p (p n (t) Gr)p( g = n ) = ij;p (p n (t) e r) (60) 

n=l n=l 

The first step holds because q is independent of the sequences |p„(£) j for all n and subsequently we 
use that q is uniformly distributed on V. Denoting by ir^ 1 the inverse of the permutation ir t , we have 

N 



(P n (t) e r) = p (p f - I(n) (t) er)=^p ({p(t) g r} f| {^\n) 

1=1 



(61) 



Note, here, unlike in eqn. (60), we may not gain much by splitting the probabilities in the last step as 
the events {Pi(t) G T} and {7r^ 1 (n) = /} are not independent. Combining eqns. (60,61), we have 



N N 



(P q (t)er) = l^^p( { P iWe r}f|K 1 (n) = /}) 

n=l 1=1 
N N 

= ]v^^ p ( {p ' (t)er} nK i H= ? }) 

1 = 1 n=l 
1 N 



(62) 
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Note the last step follows from the fact that 



N 



(63) 



n=l 



because the events {tt7 (n) = /}, n = 1, • • • , N are mutually exclusive and exhaustive, vr~ 1 (t) being a 
permutation. 

Now consider a stationary Markov chain {p(t)} on V with transition probability A and let jp(t) j 
be the sequence defined by 

P(t + l) = fa t) (p(t)), teT + (64) 

with initial condition P(0) = P(0). Then, 

N i 

p(p(t) er) =jp(p(t)€r ?(0) = i)p(p(0)=/) = -p (p(t) e r p(o) = /) (65) 
z=i 

By construction, the distribution of the sequence |p(i) j conditioned on the event {p(0) = 1} is equivalent 
to that of the sequence {Pi(i)} and hence 



Hence by eqns. (62,66) we obtain 



(p(t)er|p(0) = z) =F(P l (t)er) 
(P q (t)er) =p(p(t)er) 



(66) 



(67) 



Thus, for all t, P q {t) — P{t). By Theorem 12, we then have the weak convergence of the sequence 
{P q (t)} to ^. 



For the third part, we note that for any V e B(§+) 



1 ^ 

-p(%)er)=p(p 9 (t)er) 



(68) 



n=l 



due to the independence of q from {A(t)}. Taking the limsup and noting the non-negativity of the terms, 
we have for closed F, 



lim sup P (P n (t) G F\ < lim sup V P f P n (t ) G p) 

n=l 

AT r 1 

= iVlimsupV —p(p n (t)eF) 

n— 1 L 

= N lim sup P (Pg (t) ef) < N/jA 



The proof of the second part involves an auxiliary construction and approximation arguments to relate the 
limit properties of the sequences {P n (t)} to similar processes, where the underlying switching Markov 
chain is stationary To this end consider any strictly positive s G T+. Recall the Markov chains {p n {t)} 
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for n = 1, • • • , N with transition probability matrix A and initial state p n (0) = n. The corresponding 
sequence of interacting particle processes {P n (t)} are constructed, for each n as: 

Pn(t+l) = f Pn(t) (P n (t)) (69) 

with initial condition P n (0) = P(0). Let /q : i — > denote the Lyapunov operator 

fo(X) = TXT T + Q (70) 

and note that the following ordering holds: 

UX) ■< f (X), Vn and X G (71) 

For a given s > chosen above and for all n, define the processes {Pn(t)}t> s by 

^(Hl) = i (f) K(t)) (72) 

with deterministic initial value P^(s) = /q ^P(O)^. By eqn. (71), for any s tuple (i , h, ■ ■ ■ , i s -i) with 
V £ [1, ■ • ■ , JV] for r = 0, ■ ■ ■ , s — 1, we note that 

o /, s _ 2 o • • • o f io (P(0)) ^ / S (P(0)) (73) 

and, hence, by the monotonicity of the Riccati operators, we conclude that for all n 

Pn{t) <P^(t), t>s (74) 

Also consider a stationary Markov chain {q(t)} t > s with transition probability A, i.e., g(0) is uniformly 
distributed on V, and define the process {Q s (t)} t>s by 

Q s (t + l) = f q(t) (Q s (t)) (75) 

with deterministic initial value 

Q s (s) = ^(P(0)) (76) 
It is to be noted that by Theorem 12, the process {Q s (t)} converges weakly to fi A , i.e., 

lim dp (V(t),/i^) =0 (77) 

t— s-oo V / 

where dp denotes the Prohorov metric. We now set to relate the limit properties of {-P^(t)} to those of 
{Q s (t)}. For t > s define the total variation distance between P^(t) and Q s (t) by 

d v (Pm,Q%t))= sup |p(p^) G r)-P(Q s (t)er)| (78) 

ree(s^) 

Since for any t, the two sequences considered above assume values in a finite set, we define a set of 
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(t - s) tuples A(r) by 

A(r) = {(h, ■ ■ ■ , it- s ) | ir € [1, • • • , N] for all r and f tt _ s o ■ ■ ■ o f u (/ S (P(0))) } (79) 
It is clear that 

KWer} {(p n (s),--- , P „(t-i)) g A(r)} (80) 
{Q s (i)er} {( g ( a ),---, g (*-i))€A(r)} (8i) 

We then have 



P(P*(t)er)-P(Q^)er) = ^P( P „( s ) = i 1 ) ]T ]J A 

*i (*i,-, * t _ s )€A(r) r=l 



t-s-l 

7T. . 



= »!) ^ JJ A Jr 

ii (ti,-,i t _ s )eA(r) r=l 



t-s-1 

X . 
v+i 



t-s-l 



*i (ii,-,i t - a )€A(r) r=l 



and hence 

t-s-l 

\P(P r l(t)er)-P(Q s (t)er)\ < ^|P( P „( s ) = il )-F(^) = n)| E II ^w, 

*i (ii,-,»t_ s )eA(r) r=l 

< 2|P(p„(«)=i 1 )-P(g(«)=i 1 )| 

ii 

< y^d«(Pn(s),g(s)) < ATd„ (Pn(s),9(s)) 

ii 

where we have used the fact that 

t-s-l 

II ^r»r+i = P((Pn(s + !),-■■ ,Pn(*-l)) = (»2,-" ,H-,)W«)=<l)<l 
(ii,-,it-.)eA(r) r=l 

We thus obtain 

4(P s n (t),Q s (t)) < ATd„(pn(5),g(s)), Vt>s (82) 

It is well known that the finite state Markov chain {p n {s)} converges weakly at a geometric rate to the 
uniform measure, i.e., the measure induced by q(s) for each s and hence in variation. In other words, 

lim d v { Pn {s),q(s)) =0 (83) 
s— >oo 

Thus, by eqn. (82), we have 

lim supd v (P?(t), Q s (t)) = (84) 

s ^°° t>s 
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and, since convergence in total variation implies weak convergence ([29]), we have 

lim sup dp (P™(t),Q s (t)) = (85) 

s ^°° t>s 

Now consider e > 0. Then there exists s(s), such that, 

dp (F?(t), Q s (t)) < e/2, s > 8 (e), t > s (86) 

Since the sequence {Q s (t)} converges weakly to fi A for all s (in particular for s = s(e),) there exists 
t(s) > s(e) sufficiently large, such that, 

d P (Q< £ \t),^ <e/2, t>t(e) (87) 

Then, an application of the triangle inequality for the metric dp leads to 

dp (P^it),^) < dp [p^(t),Q s ^(t)) +d P (Q s(£) (t),^) < e (88) 
for all t > t(e). Now, by definition, 

dp (P T * (e) (i), M Z ) = inf {S > P (P* (e) (*) ei 7 )^ 3 (V) + (5 for all closed Fe§f} (89) 



where F 5 is defined as 



F 6 = { X G 



inf IIX - Yll < <5 } (90) 



Since, by eqn. (88), dp (Pn^ £ \t) , fJ, A ^J < £ for all t > t(s), we have, for any closed set F, 

P (i^ (e) (t) G f) < ^(F £ ) + e, 4 > t(e) (91) 

In addition to F being closed, let us assume that F satisfies monotonicity, i.e., X G F implies Y G F 
for all Y >z X. By eqn. (74) we have 

Pn(t) =< Pn (£) (t), t > t(e) > s(e) (92) 

and hence 

P (P n (t) G F) < P (P n s ( £ ) (t) G F) , t> t(e) (93) 
We then have from eqn. (91) for all t > t(e) 

P(P n (t) GF) <^(F e ) + e (94) 

Taking the limit as t — > oo, we have 

limsupP(F„(t) G F) < // (F £ ) + e (95) 
The L.H.S. above is now independent of t and, hence, e through t(e). Since the above holds for arbitrary 
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e > 0, moving to the limit as e — > yields 

limsupP(P n (t) G F) < lim fi A (F e ) = ^(F) (96) 

The last step follows from the continuity of the probability measure fi A and the fact that 

P| F £ = F (97) 

£>0 

for closed F. This establishes the result for general order preserving F. The result for sets of the form 
{X <E S+ \ X y al] or {le | ||X|| > a} for a > follow, as they satisfy the general hypothesis 
on F. M 

VII. Concluding remarks 

The paper develops the gossip interactive Kalman filter (GIKF) for distributed Kalman filtering in 
sensor networks, when observation sampling and inter-sensor communication occur at the same time 
scale. Inter-sensor collaboration is achieved by intermittent exchange of filtering states. A traveling particle 
interpretation of the filtering states leads to a random dynamical system (RDS) formulation of the sequence 
of conditional error covariances. Under a weak detectability assumption, the estimation error process at 
each sensor stays stochastically bounded (irrespective of the instability in signal dynamics,) provided the 
network satisfies some weak connectivity conditions. Also, the network achieves weak consensus, i.e., 
the conditional error covariance (or the pathwise filtering error) at a randomly selected sensor converges 
in distribution to a unique invariant measure /r 4 . The invariant measure [i A depends on the network 
connectivity process (the MAC protocol) through the mean A of the random adjacency matrix A. 

The characterization of the invariant measure fi A as a functional of the matrix is interesting to study 
the sensitivity of the mapping, A — > fi A . This would lead to understanding the robustness of the above 
filtering approach to perturbations in the communication policy, i.e., whether a small change in the MAC 
protocol (a perturbation of A) leads to a negligible change of // , or the filtering performance changes 
dramatically. Exploring such comparison principles for the mapping would lead to understanding the more 
complicated problem of characterizing the invariant measure [i A . Such a characterization, in general, is 
difficult as there seems to be no direct way of obtaining a functional mapping A to [l a . In fact, a 
much simpler situation (Kalman filtering with intermittent observations) involving a single sensor with 
observation packet losses demands the machinery of moderate deviations ([20]) and large random matrix 
theory ([22]) for a characterization of the invariant measures. 

Appendix A 

Random Dynamical Systems: Facts and Results 

We start by defining a random dynamical system (RDS). In the sequel, we follow the notation in [30], [27]. 

Definition 17 (RDS) A RDS with (one-sided) time T + and state space A" is a pair (0, <p) with the following 
properties: 
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A) A metric dynamical system 8 = (O, T, P, {9t, t £ T}) with two-sided time T, i.e., a probability space (f2, J 7 , P) 
with a family of transformations {8 t : i — > ^} tS T sucn ^at 

A.i) e Q = ido, fl ( oS s = e t+s , vt, s e t 

A.2) (t, w) i — > 9 t uj is measurable. 

A. 3) 8 t P = P Vt e T, i.e., P (0*5) = P for all B £ T and all t £ T. 

B) A cocycle ip over 9 of continuous mappings of X with time T + , i.e., a measurable mapping 

p : T + x ft x AT, (t, w, X) i — > ip(t, u, X) (98) 

such that 

B. l) The mapping X i — >■ (p(t,w,X) = <p(t,uj)X is continuous in X for every teT + and we!!. 
B.2) The mappings <p(t,w) = <p(t,w, •) satisfy the cocycle property: 

iys(0,cj) = id^, ip(t + s,u>) = (p (t, O s lj) o (p(s,uj) (99) 

for all t, s £ T + and well. 

Although we consider in this paper discrete time RDS, the general notion of RDS, as defined in [27], applies 
equally well to dynamical systems with continuous time. In the above definition, the randomness is captured by 
the probability space (ft, J 7 , P) and iterates indexed by co indicates path wise construction. For example, if X n is 
the deterministic initial state of the system of interest at time t = 0, the random state at time t £ T + is given by 

X t {u)=ip{t,u,X Q ) (100) 

The measurability assumptions in the definition above, guarantee that the random state X t is a well-defined 
random variable. Also, note that the iterates are defined for non-negative (one-sided) time, however, the family 
of transformations {8 t } is two-sided, which is purely for technical convenience, as will be seen later. 

Some results from RDS theory We summarize terminology and notions used in the RDS literature (see [26], 
[27] for details.) 

Consider a generic RDS (8, ip) with state space X as in Definition 17. In the following we assume that A" is a 
non-empty subset of a real Banach space V with a closed, convex, solid, normal (w.r.t. the Banach space norm,) 
minihedral cone V+. We denote by ^< the partial order induced by V+ in X and << denotes the corresponding 
strong order. Although the development that follows may hold for arbitrary X £ V, in the sequel we assume 
X = V+ (which is true for the RDS (8 R , p R ) modeling the RARE.) 

Definition 18 (Order-Preserving RDS) An RDS (6, <p) with state space V + is called order-preserving if 

X X Y =^ p(t,Lu,X) ^ <p(t,u,Y), Vf G T+ ,u £ n ,X,Y £ V+ (101) 

Definition 19 (Sublinearity) An order-preserving RDS (0, <p) with state space V+ is called sublinear if for every 
X £ V + and A £ (0, 1) we have 

Xip(t,u!,X) ^ ip(t,ui,\X), \ft>0,uj£n (102) 

The RDS is said to be strictly sublinear if strict inequality in eqn. (102) holds for X £ intV+, i.e. for X £ intV+, 

\tp(t,uj,X) ~< tp(t,uj, XX), W > 0, w £ O (103) 
and strongly sublinear if in addition to eqn. (102), we have 

\p>(t, ui, X) <C ip(t, u}, XX), Vi > 0, lo £ ft, X £ intV+ (104) 
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Definition 20 (Equilibrium) A random variable u : £1 i — > V+ is called an equilibrium (fixed point, stationary 
solution) of the RDS (9, ip) if it is invariant under <p, i.e., 

p>(t,u,u(u)) = u{6 t Lo) , VteT + (105) 

In case, eqn. (105) holds for all well, except on a set of P measure zero, we call u an almost equilibrium. 

Since, the transformations {9 t } are measure-preserving, i.e., t P — P, Vi, we have 

u (6 t u) = u(u>), Vi (106) 

Thus eqn. (105), in particular, implies that, for an almost equilibrium u, the sequence of iterates {p (t, u>, u(w))} t£T+ 
have the same distribution, which is the distribution of u. 

Definition 21 (Part) The equivalence classes in V + under the equivalence relation defined by X ~ Y if there exists 
aa > 1 such that a' a 1 X < Y ^ ct()X are called parts of V+. 

We call the part C v generated by a random variable v : il i — > V + as the collection of random variables 
u : O i — > V + such that there exists deterministic a u > 1 with 

< u(lj) ^ a u v(ui), VweU (107) 

Definition 22 (Orbit) For a random variable u : Q i — > V + we define the forward orbit rj[(uj) emanating from u(uj) 
as the random set {ip (t, co, u(cu))} teT+ . The forward orbit gives the sequence of iterates of the RDS starting at u. 

Although rjf is the object of practical interest, for technical convenience (will be seen later,) we also define the 
pull-back orbit i]^(uj) emanating from u as the random set {<p (t, #_ t cj, ii(#_ t w))} 4GT+ . 

The reason for defining the pull-back orbit is that it is comparatively convenient to establish asymptotic properties for 
rj^. However, analyzing 77^ leads to understanding asymptotic distributional properties for r][, because the random 
sequences {p (t, w, u(uj))} teT+ and {p (t, #_ t aj, w(#_ t a;))} 4gT+ are equivalent in distribution. In other words, 

p(t,w,u(cu)) = ip(t,6- t w,u(6_ t u;)), Vt e T + (108) 
This follows from the fact that 9 t P — F, Vt e T. Thus, in particular, we have the following assertion. 

Lemma 23 Let the sequence {p (t, 9- t uj, u (#_ t w))} teT+ converge in distribution to a measure [i on V+, where 
u : f2 1 — > V + is a random variable. Then the sequence {p (t, w, ii(cj))} teT+ also converges in distribution to the 
measure fi. 

We now introduce notions of boundedness of RDS, which will be used in the sequel. 

Definition 24 (Boundedness) Let a : Q 1 — > V + be a random variable. The pull-back orbit i]^(uj) emanating from 
a is said to be bounded on U G T is there exists a random variable C on U such that 

\\<p(t,0-tW,a(0-tw))\\<C(w), VteT+, uj e U (109) 

Definition 25 (Conditionally Compact RDS) An RDS (9,p) in V+ is said to be conditionally compact if for any 
U G T and pull-back orbit t]^(uj) which is bounded on U there exists a family of compact sets {X (u;)} we;7 such 
that 

lim dist (<p (i, 0-t<*>, a (0_ t w)) ,K(u)) = 0, uj G U (110) 

t— >oo 

It is to be noted that conditionally compact is a topological property of the space V+. 
We now state a limit set dichotomy result for a class of sublinear, order-preserving RDS. 
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Theorem 26 (Corollary 4.3.1. in [27]) Let V be a separable Banach space with a normal solid cone V+. Assume 
that (9, <p) is a strongly sublinear conditionally compact order-preserving RDS over an ergodic metric dynamical 
system 9. Suppose that <p(t, co,0) ^> for all t > and ui £ St. Then precisely one of the following applies: 

(a) For any X £ V + we have 

F( lim \\(p(t,6- t u,X)\\ =oo) =1 (111) 

(b) There exists a unique almost equilibrium u(u>) > defined on a ^-invariant set 12 U* e J with P(f2*) = 1 
such that for any random variable v(w) possessing the property ^ v{oj) < a u(uj) for all oj £ SI* and 
deterministic a > 0, the following holds: 

lim (p(t,6- t ui,v(0- t uj)) = u(u), uj £ 0* (112) 

t— >oc 

Appendix B 
Proofs in Section V 

Proof of Lemma 14 The proof is obtained by constructing an approximate filter with suboptimal performance, 
and then bounding its error by using the rank condition on the Grammian G Wo . We detail such a construction now. 

Consider t steps t = 1,- • ■ ,£ of the linear time-varying signal/observation model given by 2 and 3, where in 3 
we index the observation matrix as C nt where n(t) indicates the current state of the walk wo- The signal vector 
x t £ M M with initial state xi is a Gaussian random variable with known mean xi and variance I £ §*. The 
system noise process {w t } is uncorrected zero mean Gaussian with covariance Q. The observation noise process 
{vt}i=i i s uncorrected zero mean Gaussian with time varying error covariance 1Z nt and independent of the initial 
signal state and the system noise process. By the above construction, the optimal estimate of the signal state x t 
at time t, based on observations till that time, is given by the Kalman filter initialized with X as the predicted 
conditional error covariance at time t = 1. In other words, the optimal m.m.s.e. state estimator (predictor form) 

x u> {t) = E [x t |{y s }i< s <t ] (H3) 

of x t based on observations {y s } 1<s<t for 1 < t < £ + 1 can be recursively constructed through the Kalman filter 
and the corresponding predicted conditional error covariance sequence {P Wo {t)} 1<t<e+ i satisfies the recursion: 

P wa (t + 1) = TP WQ (t)T T + Q- TP Wa (t)Cl t (C nt P W0 (t)C£ + K^y 1 C nt P W0 (t)T T (1 14) 

with initial condition P«, (l) = X. We then have 

P W0 (£ + I) = fn t ° ■ ■ ■ O f ni (X) (115) 

the R.H.S. being the desired functional form in eqn. (40), i.e., P Wo (£+l) = g Wo (X). Since for a Kalman filter with 
deterministic system/observation matrices, the conditional error covariance is equal to the unconditional one and 
the fact that the Kalman filter minimizes any positive definite form of the estimation error, for a generic estimator 
h of X£ + i based on {y s }i< s <^ we have 



P W0 (£+1)<E 



(x£+i - h) (x £+ i - h 



(116) 



where ^ refers to the partial order on . In order to upper bound the functional g Wo , we now construct a suboptimal 
state estimator with a guaranteed estimation performance. To this end, define the modified Grammian 

£ 

G -o = E i^f CXC nt J"'- 1 (117) 
t=i 



A set A £ J is called 0-invariant if 6 t A = A for all t G T. 
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We note that G Wo is invertible by the invertibility of G Wo and the noise covariances lZ nt . Define the suboptimal 
estimator of X£ +i by: 

i 

x wo (f, + 1) = T l G-\ (^" 1 ) T C„ t ^- 1 y t (118) 
t=i 

based on observations {y s }i <s<t Using the fact, that, 

t-i 

^ t = F t - 1 ^ 1 +^F t - 1 - s vf sl l<t<(. + \ (119) 

s=l 

we have from eqn. (118) 

i i t-i 

+ = ^G" 1 ^; (J- t - 1 ) T C nt 7^- t 1 C nt ^- 1 x 1 +^G- 1 ^ (^- 1 ) T C„ t 7^- 1 C„ t ^^- 1 - s w s 

t=l t=l s=l 

+^G- 1 ^(^- 1 ) T C„ t 7Z- t 1 v„ t 
t=i 

t=l s=l t=l 

The filtering error is then given by 



e Wo (^+l) = x e+1 - x Wo (l + I) 

(F 1 - 1 ) 1 ' C nt H-]C nt J2 ^-^s T l G~\ J2 i^- 1 ) 1 ' C nt K-Sn (120) 



t=l s=l t=l 

We note that the error above is independent of the initial state xi (and hence the covariance X) of the system, 
and the mean square boundedness of the process noise {w t } and observation noise {v t } imply the existence of a 
constant a > 0, such that, 

E[e W0 (l + l)e W0 (£+l) T ] ± a I (121) 

The Lemma then follows by the optimality of the Kalman filter, as stated in eqn. (116). 

Proof of Lemma 15 In case F is stable, the claim is obvious, as the suboptimal estimate of at each sensor 
for all time is stochastically bounded. So, in the sequel we assume T is unstable. 

The proof is somewhat technical and mainly uses the uniform boundedness of the composition of Riccati operators 
in Lemma 14 and the ergodicity of the underlying switching Markov chain {p(t)} teT+ - From Lemma 14 it follows 
that a successive application of £ Riccati maps (in the composition order /„, o • • • o f ni ) reduces the iterate in the 
conic interval [0,a /] irrespective of its initial value. The approach is to relate the probability of large exceedance 
of P t to the hitting time statistics of a modified Markov chain. We detail it below. 

First, we note that the regularity of the distributions of P(t) for every t, implies that it suffices to show 

lim supPf P(t) >j)=0 (122) 

for some arbitrarily large to E T+. For every n, the Riccati update is upper bounded by the Lyapunov operator, 
i.e., 

f n {X) r< TXT T + Q, Vie S£ (123) 

For sufficiently large J > 0, define 

a 2k a H g 7- II Q II < J \ (124) 

or — 1 



k( J) = max ^ k e T + 
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where a = \\J-\\. Since T is unstable (a > 1), we note that k(J) — > oo as J — >• oo. 

We introduce additional notation here. For integers io^i > ^ the phrase "there exists a (n ll n 2l ■ • ■ ,ni) cycle 
in the interval [fo,ti]" indicates the existence of an integer t < i < t\, such that, 



p(t-£ + s)=n s , l<s<£ 

where {p(t)} teT+ is the switching Markov chain. 

We now make the following claim for relating the probabilities of interest for sufficiently large t: 



( P{t) > j) < P (no (m,ri2, ••• ,ne) exists in [t - k(J),t\) 



(125) 



(126) 



Indeed, assume on the contrary that a (ni,-- - , n?) cycle exists in the interval [t — k(J),t\. Then there exists 
ie[t- k(J),t], such that, 

p(t-£ + s) =n s , l<s<£ 



This implies 

and hence by Lemma 14 



P{t)=fn e °---of ni (Ptf-t+l)) 



P (t) r< «o/ 

which holds irrespective of the value of P (t — I + l). By eqn. (123) we note that 

P(s) < TP{s - l)F T + Q, Vs 
Continuing the recursion and noting P (t) -< a I 



(127) 
(128) 
(129) 

(130) 



P(t) 



< 



P 



v 2 (*-*) - 



a' 



1 



l||Q||=a 2 (*-*)a 



+ 



a 2 (*-*) 



leu 



Since (t — t) < k(J), it follows from the above 



P(t) 



< 



2(t-t) i 



n,2fe(.7) _ 1 

, ||6||<a 2fc ( J )a + \\Q\\<J 
z — 1 a z — 1 



where the last step follows from the definition of k( J) (eqn. (124)). We thus note that the existence of a (m 



cycle in [t — k(J),t] implies P(t) 



< J. i.e., we have the event inclusion: 



{there exists a (m, • • • ,ng) cycle in [t — k(J),t]} C < jj 



,nt) 



(131) 



The claim in eqn. (126) follows. Thus estimating the probability on the L.H.S. of eqn. (126) reduces to estimating 
the probability of a (m, • • • , tie) cycle in [t — k(J), t]. To this end we construct another Markov chain {z(t)} t >e- 
The state space Z is a subset of V 1 given by: 

Z={z = {i 1 ,i 2 ,--- ,h) \A ij}i . +1 >0, !<]<£} (132) 

The dynamics of the Markov chain {z(t)} t >i is given in terms of the Markov chain {p{t)} tl£T+ as follows: 

z(t) = (p(t -£+l),p(t-£ + 2),--- ,p(t)) (133) 

From the dynamics of {p{t)} teT+ it follows that {z(t)} t >e is a Markov chain with transition probability A n i 
between allowable states («i, «2, • • • ,ii-i,n) and (i 2 , • • • ,ie,n,l). With state space Z, the Markov chain {z(t)} 
inherits irreducibility and aperiodicity from that of {p(t)}. Also, {z(t)} is stationary from the stationarity of {p(t)} 
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with invariant distribution: 

1 

P(«(t) = (ii,*2,-" = ( Zl ' 12 '-- - ,»<)e2,t>U6T+ (134) 



Denote the hitting time t of {z(i)} to the state (m, ■ ■ ■ ,ne) by: 

r = min{t >£|*(i) = (m,-- - ,n*)} (135) 

and for all z G Z define 

P. (r > s) - P (r > a | z(^) = z) (136) 
Also, for each t > I and J sufficiently large, define the stopping times 

t/ = min{t > t - k(J) \z(t) = (m,-- - ,n<)} (137) 
From the Markov property it then follows 

P (t/ > t | z(t - k(J) - 1) = z) = P 2 (t > jfc(J) + 1) (138) 

It then follows successively 

P(no (m,--- ,ne) exists in [t-k(J),t]) = P (r/ > i) 

= £;[p(*(t-fc(j)-i) = *) 

zez 

P (t/ > t I «(t - fc(J) - 1) = z)] 

= P (z(t - fe(J) - 1) = z) P 2 (r >fc(J) + l) (139) 

zez 

Since the above development holds for all t > t for some sufficiently large to, we conclude from eqn. (126) 

supP( P(t) > j) < V ¥{z(t- k{J) - 1) = z)P z (t > k(J) + 1) (140) 

t>t ^ ' - 



zeZ 



The recurrence (in fact positive recurrence) of the finite state Markov chain {z(t)} and the fact that k( J) — >• oo as 
J — > oo imply, for all z G Z, 

lim P z (t > k(J) + 1) = (141) 

J— >oo 

Since Z is finite, letting J — > oo in eqn. (140) leads to 

lim supPf P(t) >j)=0 (142) 



by the dominated convergence theorem and the Lemma follows. 
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